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(57) ABSTRACT 

A method of stabilizing a video image displayed in multiple 
video fields of a video sequence includes the steps of: subdi- 
viding a selected area of a first video field into nested pixel 
blocks; determining horizontal and vertical translation of 
each of the pixel blocks in each of the pixel block subdivision 
levels from the first video field to a second video field; and 
determining translation of the image from the first video field 
to the second video field by determining a change in magni- 
fication of the image from the first video field to the second 
video field in each of horizontal and vertical directions, and 
determining shear of the image from the first video field to the 
second video field in each of the horizontal and vertical direc- 
tions. 

15 Claims, 3 Drawing Sheets 
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VIDEO IMAGE STABILIZATION AND 
REGISTRATION— PLUS 

ORIGIN OF THE INVENTION 

This invention was made by an employee of the United 
States Govermnent and may be manufactured and used by or 
for the Government for Governmental purposes without the 
payment of royalties. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates generally to video image 
processing methods and, in an embodiment described herein, 
more particularly provides a method of stabilizing and regis- 
tering video images. 

2. Description of Related Art 

Techniques presently exist for stabilizing video images. 
These techniques typically function to reduce or eliminate 
image translation (i.e., displacement) horizontally and verti- 
cally in a video sequence. In general, these techniques are 
very limited in effectiveness, since they are not able to com- 
pensate for image rotation or dilation. In addition, these tech- 
niques are sensitive to the effects of parallax in which objects 
in the foreground and background are moving at different 
rates and/or directions. Furthermore, these techniques are 
typically able to determine image motion only to the nearest 
pixel. 

Video image stabilization and other image enhancing tech- 
niques are also described in the following U.S. published 
applications: 2002/0064382 2003/0090593 2003/0099410; 
and U.S. Pat. Nos. 5,784,175 5,453,800 5,327,232 5,210,605 
4,924,306 5,815,670 5,742,710 5,734,737 5,686,973 5,535, 
288 5,528,703 5,778,100 5,748,784 5,748,761 5,745,605 
5,737,447 5,734,753 5,729,302 5,703,966 5,684,898 5,581, 
308 5,555,033 5,488,675 5,488,674 5,473,364 5,325,449 
5,259,040 5,067,014 4,797,942 4,675,532 4,937,666 4,979, 
738 5,144,423 5,263,135 5,276,513 5,278,915 5,321,748 
5,518,497 5,534,925 5,566,674 5,627,915 5,629,988 5,635, 
994 5,657,402 5,717,793 5,909,657 5,920,657 5,963,675 
6,037,988 6,173,089 6,571,021 6,640,018 6,373,970 6,650, 
792 5,943,450 5,204,944 5,050,225 4,908,874 4,893,258 
4,759,076 4,672,680 6,459,822 and 6,560,375. 

The last two of these (U.S. Pat. Nos. 6,459,822 and 6,560, 
375), having the present inventor as a coinventor thereof, 
provide an advanced video image stabilization and registra- 
tion technique which is very accurate and is capable of com- 
pensating for image rotation and dilation, and is capable of 
compensating for the effects of parallax. Unfortunately, how- 
ever, this technique does not compensate for other forms of 
distortion, including different magnifications in different 
directions (as seen, for example, when an object is rotated 
toward or away from the camera and thus foreshortened in 
one direction) and shearing of the image (as seen in more 
complex object motion). This technique also uses prior 
knowledge of the shape (width-to -height ratio) of the image 
elements (pixels) to both determine the changes in the image 
using its limited image transformation, and to then correct for 
those changes. 

Therefore, it can be seen that it would be quite desirable to 
provide an improved video image stabilization and enhance- 
ment technique which can compensate for additional forms of 
image distortion, and which does not require advance knowl- 
edge of a pixel width-to -height ratio of the image. It is accord- 
ingly among the objects of the present invention to provide 
such a technique. 
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SUMMARY OF THE INVENTION 

In carrying out the principles of the present invention, in 
accordance with an embodiment thereof, a method is pro- 
5 vided for stabilizing and registering video images. The 
method compensates for more generalized forms of image 
distortion, and does not require advance knowledge of a pixel 
width-to -height ratio of an image. 

In one aspect of the invention, displacement and dilation of 
to an image from one video field to another in a video sequence 
are determined by choosing a key video field and selecting a 
key area of pixels within the key video field which contains 
the image. The key area is then subdivided into multiple levels 
of nested pixel blocks. Translation of the key area from the 
1 5 key field to a new video field is approximated by searching for 
an area in the new video field having a maximum correlation 
to the key area. The key area translation approximation is 
used as a starting point for determination of the translation of 
each of the pixel blocks in the largest pixel block subdivision 
20 from the key video field to the new video field. The translation 
of each of the pixel blocks in the laigest pixel block subdivi- 
sion is then used as a starting point for determination of the 
translation of each of the respective associated pixel blocks in 
the next smaller pixel block subdivision. This process is 
25 repeated until a determination of the translation of each of the 
pixel blocks in the smallest pixel block subdivision is made. 
Certain of the pixel blocks may be masked, for example, if a 
maximum correlation coefficient between one of the smallest 
pixel blocks andpixel blocks in the new video field is less than 
30 a predetermined value, in which case they are not considered 
in any subsequent calculations. 

Translation of the image from the key video field to the new 
video field is found by determining a change in magnification 
of the image from the key video field to the new video field in 
35 each of horizontal and vertical directions, determining shear 
of the image from the key video field to the new video field in 
each of the horizontal and vertical directions, and correcting 
the horizontal and vertical translations of each of the pixel 
blocks in the smallest pixel block subdivision for the change 
40 in magnification and shear of the image from the key video 
field to the new video field. The corrected horizontal and 
vertical pixel block translations are then averaged to produce 
respective horizontal and vertical translations of the image 
from the key video field to the new video field. 

45 These and other features, advantages, benefits and objects 
of the present invention will become apparent to one of ordi- 
nary skill in the art upon careful consideration of the detailed 
description of a representative embodiment of the invention 

hereinbelow and the accompanying drawings. 

50 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a flow chart representing a method embodying 
principles of the present invention; 

FIG. 2 is a flow chart representing substeps in a video frame 
pre-processing step of the method of FIG. 1; 

FIG. 3 is a flow chart representing substeps in a key area 
subdividing step of the method of FIG. 1; 

60 FIG. 4 is a flow chart representing substeps in a key area 
masking step of the method of FIG. 1; 

FIG. 5 is a flow chart representing sub steps in an image 
translation approximating step of the method of FIG. 1; 

FIG. 6 is a flow chart representing substeps in a pixel block 
65 translation determining step of the method of FIG. 1; 

FIG. 7 is a flow chart representing sub steps in a magnifi- 
cation change determining step of the method of FIG. 1; 
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FIG. 8 is a flow chart representing substeps in an image 
shear determining step of the method of FIG. 1; 

FIG. 9 is a flow chart representing substeps in an image 
translation determining step of the method of FIG. 1; and 

FIG. 10 is a flow chart representing substeps in a subse- 
quent video field pre-processing step of the method of FIG. 1 . 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

It is to be understood that the various embodiments of the 
present invention described herein may be utilized in various 
orientations, such as inclined, inverted, horizontal, vertical, 
etc., and in various configurations, without departing from the 
principles of the present invention. The embodiments are 
described merely as examples of useful applications of the 
principles of the invention, which is not limited to any specific 
details of these embodiments. 

Representatively illustrated in FIG. 1 is a method 10 which 
embodies principles of the present invention. In the following 
description of the method 10, reference is made to a standard 
video format well known to those skilled in the art, in which 
a video sequence includes multiple sequentially displayed 
video frames, with each video frame comprising two inter- 
laced video fields, each of which presents an image as an 
arrangement of pixels having red, green and blue brightness 
levels, etc. However, it is to be clearly understood that the 
principles of the present invention are not limited to use with 
the standard video format, and that other formats, and other 
types of formats may be utilized, without departing from the 
principles of the present invention. 

The method 10 includes steps 20, 30, 40, 50, 60, 70, 80, 90 
and 100, and each of these steps includes substeps represen- 
tatively depicted in the accompanying FIGS. 2, 3, 4, 5, 6, 7, 8, 
9 and 10, respectively. Note that steps 50-100 are repeated, 
with these steps being performed for each video field in a 
video sequence, as described in further detail below. 

Step 20 is a video frame pre-processing step. Due to the fact 
that the standard video format video frame includes two inter- 
laced video fields, one video field following the other in time, 
it is preferred to separate these video fields before beginning 
to analyze the motion of an image of interest therein. 

In step 22, the video fields are extracted from each video 
frame of a video sequence. In the standard video format, one 
video field consists of even-numbered horizontal lines, and 
the other video field consists of odd-numbered horizontal 
lines, of each video frame, with the video fields being sepa- 
rated by V6oth of a second in time. These horizontal lines are 
rows of pixels making up the image shown in the video frame. 

When the video fields are separated out, each will have 
alternating blank lines therein, due to the absence of the 
corresponding other video field from its video frame. There- 
fore, in step 24, interpolation is used to fill in the missing lines 
in each video field. Video interpolation techniques are well 
known to those skilled in the art and will not be described 
further herein. Any such interpolation techniques may be 
utilized in keeping with the principles of the present inven- 
tion. 

In step 26, each video field image is transformed into a 
gray-scale image by averaging together the red, green and 
blue brightness values of each pixel of the video field. Of 
course, step 20 could begin with a gray-scale (i.e., black and 
white in common parlance) video sequence, in which case 
step 26 would be unnecessary. 

Step 30 is a key area subdividing step. This step produces 
groupings of pixels on multiple levels, such that each pixel 
group or block (other than the smallest size of pixel block) 
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includes multiple smaller pixel blocks. In this sense, the pixel 
blocks are “nested” with respect to each other. 

In step 32, a key field is selected. The key field is one of the 
video fields extracted in step 22. Preferably, the key field 
5 contains an image of interest, and at least a portion of that 
image displays an object, person, etc. which the objective is to 
stabilize in the video sequence. For example, if the video 
sequence shows an image of a moving car and it is desired to 
stabilize the video sequence so that the image of the car is 
to relatively motionless, the key field will preferably be selected 
as one of the video fields which contains a relatively clear 
centralized image of the car. The key field may be any one of 
the video fields in the video sequence, e.g., at the beginning, 
middle or end of the video sequence. 

15 In step 34, a key area within the key field is selected. 
Preferably, the key area is a rectangular array of pixels and 
contains the specific image of interest about which it is 
desired to stabilize the video sequence, with a minimum of 
background, foreground, extraneous images, etc. Using the 
20 above example, the key area would preferably contain the 
image of the car and little else. The key area may be any group 
of pixels in the key field. For use as an example in the follow- 
ing further description of the method 10, the key area may be 
a rectangular group of pixels which is 358 pixels wide by 242 
25 pixels high. 

In step 36, the key area is preferably adjusted so that it 
contains a convenient whole number multiple of the smallest 
pixel block size into which the key area is to be subdivided. 
Thus, the key area is adjusted so that it can be conveniently 
30 subdivided into progressively smaller blocks of pixels. Using 
the above example, and assuming that the smallest desired 
pixel block size is a 15x15 block of pixels, the next larger 
pixel block size is a 30x30 block of pixels and the largest pixel 
block size is a 60x60 block of pixels, the key area may be 
35 adjusted to a size of 360x240 pixels. It will be readily appre- 
ciated that an array of 360x240 pixels may be conveniently 
subdivided into 60x60 pixel blocks, further subdivided into 
30x30 pixel blocks, and still further subdivided into 15x15 
pixel blocks. 

40 In step 38, the adjusted key area is subdivided into nested 
pixel blocks, that is, larger pixel blocks having smaller pixel 
blocks therein. Using the above example, there will be 24 of 
the 60x60 pixel blocks in the 360x240 adjusted key area, 
there will be 96 of the 30x30 pixel blocks (four 30x30 pixel 
45 blocks in each 60x60 pixel block) and there will be 3 84 of the 
15x15 pixel blocks (four 15x15 pixel blocks in each 30x30 
pixel block). 

In this example, the pixel block subdivisions have been 
selected to be 1 5x1 5 as the smallest, 30x30 as the next larger, 
50 and 60x60 as the largest, the pixel blocks therein are square, 
there are three levels of pixel blocks, and each pixel block 
subdivision has four times the number of pixel blocks as the 
next larger pixel block subdivision. However, it is to be 
clearly understood that other pixel block sizes, other pixel 
55 block shapes, other numbers of pixel block levels and other 
relationships between pixel block subdivisions may be used, 
without departing from the principles of the present inven- 
tion. For instance, the smallest pixel block size could be 
12x12, pixel blocks could be rectangular, but not square, 
60 there could be four levels of nested pixel blocks and one level 
could have nine times the number of pixel blocks as the next 
larger pixel block subdivision, while another level could have 
twelve times the number of pixel blocks as the next larger 
pixel block subdivision. 

65 Step 40 is a data masking step in which selected pixel 
blocks are excluded from further consideration in the method 
10. A data mask is constructed by producing an array of 
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numbers in which each element of the array corresponds to 
one of the smallest pixel blocks of the key area. Using the 
above example of a 360x240 pixel key area and 15x15 small- 
est pixel blocks, the data mask would be a 24x16 array. An 
element of the array is set to 1 if the corresponding pixel block 
is to be included in further calculations, and the element is set 
to 0 if the corresponding pixel block is to be excluded from 
further calculations. 

In step 42, an operator is permitted to manually exclude 
pixel blocks which are not of interest. Using the above 
example of a key area containing an image of a car, the key 
area may also include images of other objects, such as objects 
in the foreground, background, etc., which are not germane to 
the analysis. Computational economy and accuracy are 
enhanced when the pixel blocks containing these extraneous 
images are masked by changing the corresponding elements 
in the data mask array to 0. 

In step 44, featureless pixel blocks are masked. This mask- 
ing is done automatically and results when the scale of the 
variations in a pixel block are smaller than a predetermined 
value. The scale of the variations in a pixel block is given by 
the standard deviation of the average brightness level of each 
individual pixel in the pixel block. Recall that the average 
brightness level of each pixel was determined in step 26 
above. 

Step 50 provides an approximation of the translation (hori- 
zontal and vertical shift or displacement) of the key area from 
the key field to a new field in the video sequence. This 
approximation is used to aid in the search for translation of the 
progressively smaller pixel blocks, as described below. 

In step 52, a correlation coefficient between the key area 
and a corresponding area in the new video field is calculated 
by a process known as cross-correlation. Such calculation of 
correlation coefficient between arrays of pixels is well known 
to those skilled in the art and results in a number which is 
related to the degree to which one array “matches” another 
array. Thus, the key area is cross-correlated with a corre- 
sponding area in the new video field, the corresponding area 
having the same shape and size as the key area and being 
located in the new field as the key area is located in the key 
field. 

In step 54, the key area is cross -correlated with other areas 
in the new video field, with the centers of the other areas being 
displaced relative to the center of the corresponding area used 
in step 52. For example, correlation coefficients may be cal- 
culated for areas 10 pixels to the right, 10 pixels to the left, 10 
pixels up and 10 pixels down relative to the corresponding 
area used in step 52. If a correlation coefficient between the 
key area and one of these other areas is greater than the 
correlation coefficient between the key area and the corre- 
sponding area found in step 52, then there is an indication that 
the image has translated in the direction of the area having the 
increased correlation coefficient. If the correlation coefficient 
between the key area and the corresponding area found in step 
52 is greater than the correlation coefficient of each of the 
other areas, but one of the other areas has a correlation coef- 
ficient greater than the remainder of the other areas, then there 
is an indication that the image has translated in the direction 
of the other area having the maximum correlation coefficient, 
but is between the corresponding area and the other area 
having the maximum correlation coefficient. 

In step 56, the search is refined based on the indications 
given by steps 52 and 54. Thus, the correlation coefficients 
calculated in steps 52 and 54 are used as a basis on which the 
search is refined. In general, the objective is to determine the 
area in the new field having the maximum correlation coeffi- 
cient. 
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As depicted in FIG. 5, steps 54 and 56 are repeated, with 
correlation coefficients being calculated, the search refined, 
correlation coefficients calculated again, the search refined 
again, etc., until no further increase in correlation coefficient 
5 is achieved. 

In step 58, the area in the new field having the maximum 
correlation to the key area is selected. This area is considered 
to be a rough approximation of the actual location of the 
image contained in the key area, as translated between the key 
10 field and the new field. 

Step 60 is in large part a repeat of step 50, except that it is 
performed for each pixel block in each pixel block subdivi- 
sion, beginning with the largest pixel block subdivision. As 
step 50 began with a calculation of correlation coefficient 
15 between the key area and the corresponding area in the new 
video field, step 60 begins with a calculation of correlation 
coefficient between one of the largest pixel blocks and a 
corresponding pixel block in the area selected in step 58. 
Using the above example, a 60x60 pixel block of the key area 
20 is first cross-correlated with a corresponding 60x60 pixel 
block in the area selected in step 58. The 60x60 pixel block of 
the key area is then cross -correlated with other 60x60 pixel 
blocks having respective centers which are displaced relative 
to the center of the corresponding 60x60 pixel block. The 
25 results of these calculations are then used to indicate the 
direction of translation of the 60x60 key area pixel block. The 
search is then refined and the process repeated to determine 
the translation of the 60x60 pixel block from the key area to 
the area selected in step 58 by finding the 60x60 pixel block 
30 having maximum correlation to the 60x60 key area pixel 
block. This process is then repeated for each of the other 
60x60 pixel blocks in the key area, so that the translation of 
each 60x60 pixel block from the key field to the new field is 
determined. 

35 Using the translation of its associated 60x60 pixel block as 

a first approximation, the translation of each 30x30 pixel 
block is determined. Then, using the translation of its associ- 
ated 30x30 pixel block as a first approximation, the transla- 
tion of each 15x15 pixel block is determined. Thus, step 60 of 
40 the method 10 progresses from the largest pixel block subdi- 
vision to the smallest pixel block subdivision, determining 
the translation of each pixel block within each subdivision, 
using the previously determined translation of the next larger 
associated pixel block as a starting point for determining the 
45 translation of each pixel block. Specific details of substeps 
61-66 of step 60 are described in further detail below. 

In step 61, the determination of each key field pixel block’ s 
translation begins with the laigest pixel block subdivision. 
Using the example given above, wherein the 360x240 pixel 
50 key area is first subdivided into 60x60 pixel blocks, further 
subdivided into 30x30 pixel blocks, and then further subdi- 
vided into 15x15 pixel blocks, the process of step 60 begins 
with the 60x60 pixel blocks. Of course, if other pixel block 
subdivisions are made, then the process of step 60 might 
55 begin with pixel blocks of another size. For instance, the key 
area could be initially subdivided into 40x40 pixel blocks, in 
which case step 61 would begin with 40x40 pixel blocks, 
instead of 60x60 pixel blocks. 

In step 62, the correlation coefficient between a pixel block 
60 and the corresponding pixel block in the new field is calcu- 
lated. For the largest pixel block subdivision, the correspond- 
ing pixel block in the new field is the pixel block of the key 
field translated the same as the key area translated from the 
key field to the new field. In this manner, the translation of the 
65 key area from the key field to the new field, as determined in 
step 50, is used as a first approximation of the translation of 
each of the largest pixel block subdivision pixel blocks. Using 
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the above example, the correlation coefficient would be cal- 
culated for a 60x60 pixel block of the key area and a 60x60 
pixel block of the new field translated the same relative to the 
60x60 pixel block of the key area as the key area translated 
from the key field to the new field. 5 

In step 63, a search is performed for the pixel block in the 
new field having maximum correlation to the pixel block in 
the key area. This step is similar to steps 54, 56 and 58 
described above, in which an area in the new field having 
maximum correlation to the key area is selected. In other 10 
words, step 63 is steps 54, 56 and 58 performed for an indi- 
vidual pixel block, rather than for the entire key area. Thus, 
correlation coefficients between the individual pixel block of 
the key area and pixel blocks displaced relative to the corre- 
sponding pixel block of the new field are calculated, the 15 
search is refined based on the results of these calculations, 
further correlation coefficients are calculated, etc., until the 
pixel block of the new field having the maximum correlation 
to the pixel block of the key area is determined. 

In step 64, the translation of each pixel block is determined. 20 
Steps 62 and 63 have been described above as having been 
performed for a single pixel block of a pixel block subdivi- 
sion. However, step 64 signifies that the translation of each 
pixel block in the pixel block subdivision is determined. This 
determination is made by performing steps 62 and 63 for each 23 
pixel block in the pixel block subdivision. Using the example 
given above, the key area contains 24 of the 60x60 pixel 
blocks. Thus, steps 62 and 63 would be performed 24 times 
for the largest pixel block subdivision, thereby permitting the 
translation of each of the 60x60 pixel blocks to be determined 30 
independently. 

Note that it cannot be assumed that the pixel blocks are 
translated from the key field to the new field the same as the 
key area is translated from the key field to the new field, since 35 
rotation and change of magnification of the image from the 
key field to the new field may change the relative positionings 
of the pixel blocks. This is the reason the approximate trans- 
lation of the key area from the key field to the new field as 
found in step 50 is used only as a starting point for determi- 4Q 
nation of the translation of each pixel block of the largest pixel 
block subdivision. 

In step 65, the process is advanced to the next smaller pixel 
block subdivision. Thus, after the translation of each pixel 
block in the largest pixel block subdivision is determined, the 45 
next smaller pixel block subdivision is evaluated to determine 
the translation of each pixel block therein. FIG. 6 shows that 
steps 62-65 are repeated, so that the translation of each pixel 
block in each pixel block subdivision is determined, progress- 
ing from the largest pixel block subdivision to the smallest 50 
pixel block subdivision. 

Note that in step 62, when a correlation coefficient for a 
pixel block in a pixel block subdivision other than the largest 
pixel block subdivision is calculated, the corresponding pixel 
block in the new field is the pixel block of the key field 55 
translated the same as the associated pixel block of the next 
larger pixel block subdivision translated from the key field to 
the new field. In this manner, the translation of the associated 
next larger pixel block from the key field to the new field, as 
previously determined in step 64, is used as a first approxi- 60 
mation of the translation of each of the pixel block subdivi- 
sion pixel blocks. Using the above example, the correlation 
coefficient would be calculated for a 30x30 pixel block of the 
key area and a 30x30 pixel block of the new field translated 
the same relative to the 30x30 pixel block of the key area as its 65 
associated 60x60 pixel block translated from the key field to 
the new field. 
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After steps 62-65 have been performed for each pixel block 
subdivision (except that step 65 cannot be performed after the 
smallest pixel block subdivision has been evaluated), the 
result is that the translation of each pixel block in each pixel 
block subdivision has been determined. This result is very 
beneficial, since the translations of the smallest pixel blocks 
may now be used to more precisely determine the translation 
of the key area from the key field to the new field, and may 
further be used to determine rotation, dilation, and shearing of 
the image between the key field and the new field. 

However, it is recognized that the correlation between a 
pixel block of the key field and a pixel block of the new field 
may only be very low, due to a variety of reasons. For 
example, a particular pixel block of the new field which is a 
translated pixel block of the key area may be obscured due to 
the presence of an object in the image foreground. Thus, in 
step 66, a pixel block in the smallest pixel block subdivision 
is masked when its maximum correlation to pixel blocks in 
the new field, as determined in step 63, is below a predeter- 
mined value. For example, if the maximum calculated corre- 
lation coefficient for a pixel block in the smallest pixel block 
subdivision is less than 0.7, the pixel block may be excluded 
in the data mask described in step 40 above. If a pixel block is 
masked, it is not considered in any further calculations in the 
method 10. 

Since step 60 provides a measure of the translation of each 
pixel block in the smallest pixel block subdivision from the 
key field to the new field, this information may be used to 
determine whether the pixel blocks have spread apart or con- 
tracted relative to each other, whether the pixel blocks have 
rotated relative to each other, and whether there is shearing of 
the image in the vertical or horizontal directions. In general, 
a two-dimensional image transformation is described by an 
“affine transformation” that involves six quantities (A-F) 
such that: 

x new =A+Bx oIJ ¥Cy o2d 

y n ew =D+Ex oi^- F yoid (i) 

where x oId , y old are the coordinates of a pixel block in a 
previous or key field, and x wew , y new are the coordinates of the 
pixel block in a new field. 

In the method 1 0, the image transformation is described by 
the following: 

x new =M x [{x 0 i < ^Ax)+(y old ^Ay)S x \ 

y new =M y [(y old +Ay) + (x old +Ax)S y ] (2) 

where Ax is the horizontal translation, Ay is the vertical trans- 
lation, M x is the magnification in the horizontal direction, M y 
is the magnification in the vertical direction, S x is the shear in 
the horizontal flow, and is the shear in the vertical flow. 
This can be rewritten in the form of equation (1) above using 
the following substitutions: 

A=MJAx+AyS x \ 

B=M X 

C=M X S X 

D=M y [Ay+AxS y ) 

E=M y S y 

¥=M y (3) 

Knowledge of the shape of the pixels in the image (number 
of pixels in the width and height of the image) is not required, 
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in part because the shear is determined separately for the 
horizontal and vertical directions. In the method 10, pixel 
counts are used for the coordinates and displacements of the 
pixels, rather than using physical units related to the shape of 
the pixels in the image. The displacements of the pixel blocks 5 
within the area of interest in the image produces a flow map 
with displacements in the horizontal and vertical directions 
(dx, dy) defined by: 

te(x 0 i d y 0 i d )=x n J,x 0 i d ,y 0 i d )-x 0 i d i 

( 4 ) 

Step 70 is a magnification determination step in which the 
change in magnification of the image from the key field to the 
new field is determined. In step 72, the difference in horizon- 15 
tal translation is calculated for each pixel block row pair in the 
smallest pixel block subdivision. Using the example above, 
for the 360x240 pixel key area and 15x15 pixel blocks in the 
smallest pixel block subdivision, there are twenty -four 15x15 
pixel blocks in each row of the key area. The change in 20 
horizontal translation for each pair of pixel blocks, divided by 
the distance between the pixel block centers, is calculated for 
each row of the key area. This calculation gives the horizontal 
change in magnification for each pixel block pair. 

For example, if a pixel block on a row moves to the left 10 25 
pixels from the key field to the new field, while a pixel block 
300 pixels away moves to the left 13 pixels from the key field 
to the new field, the horizontal change in magnification is 1 % 

(a 3 pixel difference in horizontal translation over a 300 pixel 
distance). As described above, masked pixel blocks are 30 
excluded from these calculations. 

Thus, a single pair of pixel blocks at horizontal positions x x 
and x 2 on a row of pixels at vertical position y old will contrib- 
ute an estimate of M x as follows: 

M= 1 + [0x(xi^ oW )-ax(^ o w)]/(^i -x 2 ) (5) 35 

The magnification in the horizontal direction (M x ) is deter- 
mined by dividing the relative horizontal displacement 
between pairs of pixel blocks in each of the rows of pixel 
blocks by the distance between the respective centers of the 40 
pair of pixel blocks, and by averaging together the results, 
giving greater weight to those pixel block pairs with larger 
distances between them and more consistent results. The 
magnification changes for more widely spaced apart pixel 
block pairs are weighted more than those for relatively 45 
closely spaced pixel block pairs, since widely spaced apart 
pixel blocks are more sensitive to changes in magnification. 
Additionally, individual pixel block pair magnification 
changes may be excluded from the weighted average if their 
values are significantly different from the average, for 50 
example, a pixel block pair magnification change value may 
be excluded from the weighted average calculation if it is 
more than one standard deviation from the average of the 
magnification changes. In this manner, erroneous magnifica- 
tion change calculations do not affect the weighted average. 55 

In step 74, the difference in vertical translation is calculated 
for each pixel block column pair in the smallest pixel block 
subdivision. Using the example given above, for the 360x240 
pixel key area and 15x15 pixel blocks in the smallest pixel 
block subdivision, there are sixteen 15x15 pixel blocks in 60 
each column of the key area. The difference in vertical trans- 
lation for each pair of pixel blocks, divided by the distance 
between the pixel block centers, is calculated for each column 
of the key area. This calculation gives the vertical change in 
magnification for each pixel block pair, similar to the manner 65 
in which the horizontal change in magnification for pixel 
block pairs in the key area rows are calculated in step 72. 
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Thus, the magnification in the vertical direction (M^) is 
determined by dividing the relative vertical displacement 
between each pair of the pixel blocks in each of the columns 
of pixel blocks by the distance between the respective centers 
of the pair of pixel blocks, and by averaging together the 
results, giving greater weight to those blocks with larger 
distances and more consistent results. A single pair of pixel 
blocks at vertical positions y x and y 2 on a column of pixels at 
horizontal position x old will contribute an estimate of M >: as 
follows: 

M y =l+[dy(x oU ,y l )-dy(x old ,y 2 )]/(y l -y 2 ) (6) 

The magnification changes for more widely spaced apart 
pixel block pairs are weighted more than those for relatively 
closely spaced pixel block pairs, and individual pixel block 
pair magnification changes may be excluded from the 
weighted average if their values are significantly different 
from the average. For example, a pixel block pair magnifica- 
tion change value may be excluded from the weighted aver- 
age calculation if it is more than one standard deviation from 
the average of the magnification changes. 

Note that, in contrast to prior image stabilization and reg- 
istration methods, a weighted average of the individual hori- 
zontal and vertical magnification changes is not used. Instead, 
separate magnification changes are used for the horizontal 
and vertical directions. This provides for situations where an 
object in the image is rotated relative to the camera, and thus 
appears foreshortened in one direction, but not the other 
direction. 

Step 76 in FIG. 7 represents the individual horizontal and 
vertical magnification change calculations described above. 

Step 80 is a shear determination step. This step is somewhat 
similar to step 70 in that changes in translation of pixel block 
pairs from the key field to the new field are used to calculate 
shear in the horizontal and vertical flows (S x , S v ). 

In step 82, the difference in horizontal translation is calcu- 
lated for each pixel block pair in each of the smallest pixel 
block subdivision columns. The difference in horizontal 
translation for each pair of pixel blocks, divided by the dis- 
tance between the pixel block centers, is calculated for each 
column of the key area. This calculation gives the horizontal 
shear for each pixel block pair in each column. For example, 
if a pixel block in a column moved to the right 1 pixel while 
another pixel block 300 pixels away in the column moved to 
the left 2 pixels from the key field to the new field, the 
difference in horizontal translation would be 3 pixels and the 
horizontal shear of the pixel block pair would be 1/100 (a 3 
pixel difference in displacement over a 300 pixel distance 
gives a tangent of 3/300, equivalent to an angle of 0.57°). 

Thus, the shear in the horizontal flow (Sx) is determined by 
dividing the relative horizontal displacement between each 
pair of the pixel blocks in each of the columns of pixel blocks 
by the distance between the respective centers of the pair of 
pixel blocks, and by averaging together the results, giving 
greater weight to those pixel block pairs with larger distances 
between them and more consistent results. The calculated 
shear values for more widely spaced apart pixel block pairs 
are weighted more than those for relatively closely spaced 
pixel block pairs, since widely spaced apart pixel blocks are 
more sensitive to shear. Additionally, individual pixel block 
pair shear calculations may be excluded from the weighted 
average if their values are significantly different from the 
average, for example, a pixel block pair shear calculation may 
be excluded from the weighted average calculation if it is 
more than one standard deviation from the average of the 
shear calculations. In this manner, erroneous shear calcula- 
tions do not affect the weighted average. 
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A single pair of pixel blocks at vertical positions y x and y 2 
on a column of pixel blocks at horizontal position x old will 
contribute an estimate of S x as follows: 

Sx=[dx(x rJd ,y i )-dx(x old ,y 2 )\l\M x (y i -y 2 )] ( 7 ) 

In step 84, the difference in vertical translation is calculated 
for each pixel block pair in each of the smallest pixel block 
subdivision rows. The difference in vertical translation for 
each pair of pixel blocks, divided by the distance between the 
pixel block centers, is calculated for each row of the key area. 
This calculation gives the shear in the vertical flow (S^) for 
each pixel block pair in each row. 

Thus, the shear in the vertical flow is determined by divid- 
ing the relative vertical displacement between each pair of the 
pixel blocks in each of the rows of pixel blocks by the distance 
between the respective centers of the pair of pixel blocks, and 
by averaging together the results, giving greater weight to 
those pixel block pairs with larger distances between them 
and more consistent results. The calculated shear values for 
more widely spaced apart pixel block pairs are weighted more 
than those for relatively closely spaced pixel block pairs, and 
individual pixel block pair shear calculations may be 
excluded from the weighted average if their values are sig- 
nificantly different from the average. For example, a pixel 
block pair shear calculation may be excluded from the 
weighted average calculation if it is more than one standard 
deviation from the average of the shear calculations. 

A single pair of pixel blocks at horizontal positions x x and 
x 2 on a row of pixel blocks at vertical position y old will 
contribute an estimate of as follows: 

s y = =[dy(xi,yoij)-dy(.x 2 ,yoij)y[M y (x 1 -x 2 )] ( 8 ) 

Note that, in contrast to prior video image stabilization and 
registration methods, the overall rotation of the image from 
the key field to the new field is not calculated. Instead, sepa- 
rate shears are obtained for the horizontal and vertical direc- 
tions. This provides for situations in which an object in the 
image is rotated toward or away from the camera and appears 
foreshortened in a particular direction. This also provides for 
pixel shapes with arbitrary or unknown width-to-height 
ratios. A simple rotation of the image in physical coordinates 
will produce different shears (S x and S^) that will automati- 
cally account for the unknown pixel shape. 

Step 86 in FIG. 8 represents the individual horizontal and 
vertical shear calculations described above. 

Step 90 is an image translation determination step. Recall 
that an approximation of the image translation from the key 
field to the new field was determined in step 50. However, 
since steps 60, 70 and 80 above have provided determinations 
of the individual translations of the smallest pixel block sub- 
division pixel blocks, the change in magnification in the hori- 
zontal and vertical directions and the shear in the horizontal 
and vertical directions in the key area from the key field to the 
new field, a precise determination of the key area translation 
may now be made. 

In step 92, the horizontal translation (Ax) is determined by 
correcting the translation determined in step 60 for each of the 
smallest pixel block subdivision pixel blocks for the magni- 
fications and shears determined in steps 70 and 80, and by 
taking a weighted average of the corrected displacements. A 
single pixel block at position x old , y old will contribute an 
estimate of Ax as follows: 

{l-S x S y )-x old ( 9 ) 

In step 94, the vertical translation (Ay) is determined by 
correcting the translation determined in step 60 for each of the 
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smallest pixel block subdivision pixel blocks for the magni- 
fications and shears determined in steps 70 and 80, and by 
taking a weighted average of the corrected displacements. A 
single pixel block at position x old , y old will contribute an 
5 estimate of Ay as follows: 

4r = [ (yoid+dyipCoi^yoid^lMy-SyiXoij+dxix^yoij) )/MJ/ 

(1 -SJSyhy^ ( 10 ) 

In step 96, the overall horizontal and vertical translation for 
l o the center of the key area is calculated using the values for M x , 
M y , S x , S , Ax and Ay determined above through the inverse 
affine transformation as follows: 

15 you=y n J[M y (i-sj; y )}-x n ^[M x (\-sj; y )}-t,y (li) 

Step 100 is a pre-processing step in which the results of 
steps 70, 80 and 90 are used to pre-process a subsequent field 
in the video sequence. In this maimer, the subsequent field is 
20 placed in a condition in which it should more closely match 
the key field. The determinations of translation, shear and 
magnification change of the key area from the key field to the 
new field are used to perform an initial de-translation, de- 
shearing and de-magnification of the subsequent field. 

25 It is to be clearly understood that use of the term “subse- 
quent” herein to describe a video field does not necessarily 
signify that the video field is positioned later in the video 
sequence, but is used to signify that the video field is pro- 
cessed subsequently in the method 10. For example, a “sub- 
30 sequent” video field may actually be positioned earlier in time 
in a video sequence, since a video sequence may be processed 
from back to front (later to earlier in time), from the middle to 
either end, etc. 

In step 102, the image contained in the subsequent field is 
35 de-translated, that is, it is translated horizontally and verti- 
cally opposite to the respective distances and directions the 
key area translated from the key field to the new field as 
determined in step 90. 

In step 104, the image contained in the subsequent field is 
40 de-sheared, that is, it is sheared in the horizontal and vertical 
directions opposite to the angle and direction the key area was 
sheared from the key field to the new field as determined in 
step 80. 

45 In step 106, the image contained in the subsequent video 
field is de-magnified, that is, it is magnified (or reduced in 
magnification) in the horizontal and vertical directions oppo- 
site to the change in magnification of the key area from the 
key field to the new field as determined in step 70. 

50 Note that FIG. 1 indicates that steps 50-100 are repeated. 
These steps are performed for each video field in the video 
sequence. Thus, a change in magnification, shear and trans- 
lation are determined for the subsequent video field. These 
determinations of change in magnification, shear and trans- 
55 lation are then added to the pre-processing change in magni- 
fication, shear and translation applied to the subsequent video 
field in steps 102, 104 and 106 to yield a total change in 
magnification, rotation and translation of the key area from 
the key field to the subsequent video field. In a similar man- 
60 ner, the total change in magnification, shear and translation 
determined for the subsequent video field is used to pre- 
process the next subsequent video field in the video sequence, 
etc. 

The result of these steps is that, for each video field in the 
65 video sequence, a change in magnification, shear and trans- 
lation of the key area is determined. The video sequence may 
then be modified by de-magnifying, de-shearing and 
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de-translating each video field in the video sequence, other 
than the key field, so that the image contained in the key area 
appears motionless and at the same magnification and orien- 
tation through the entire video sequence. 

Of course, a person of ordinary skill in the art, upon a 5 
careful consideration of the above description of the method 
10, would readily appreciate that modifications, additions, 
substitutions, deletions and other changes may be made to the 
method as described above and depicted in the accompanying 
drawings, which is but a single embodiment of the invention, 10 
and these changes are contemplated by the principles of the 
present invention. Accordingly, the foregoing detailed 
description is to be clearly understood as being given by way 
of illustration and example only, the spirit and scope of the 
present invention being limited solely by the appended 
claims. 

What is claimed is: 

1. A method of stabilizing a video image of interest dis- 
played in multiple video fields of a video sequence, the 20 
method comprising the steps of: 

subdividing a selected area of a first video field into nested 
pixel blocks including multiple levels of progressively 
smaller pixel block subdivisions, the area containing the 
video image; 25 

determining horizontal and vertical translation of each of 
the pixel blocks in each of the pixel block subdivision 
levels from the first video field to a second video field; 
and 

determining translation of the image from the first video 30 
field to the second video field by determining a change in 
magnification of the image from the first video field to 
the second video field in each of horizontal and vertical 
directions, determining shear of the image from the first 
video field to the second video field in each of the hori- 35 
zontal and vertical directions, and correcting from the 
first video field to the second video field the horizontal 
and vertical translations of each of the pixel blocks in the 
smallest pixel block subdivision for the chance in mag- 
nification and shear of the image due to an object being 40 
rotated toward or away from a camera, and averaging the 
corrected horizontal and vertical pixel block transla- 
tions. 

2. The method of claim 1, wherein the change in magnifi- 
cation determining step is performed by: 45 

a) for the horizontal direction, by dividing a relative hori- 
zontal translation of each pair of the pixel blocks in each 
row of the smallest pixel block subdivision by a respec- 
tive distance between centers of the pixel blocks in the 
row pair, to thereby determine a horizontal magnifica- 50 
tion for each pixel block row pair; and 

b) for the vertical direction, by dividing a relative vertical 
translation of each pair of the pixel blocks in each col- 
umn of the smallest pixel block subdivision by a respec- 
tive distance between centers of the pixel blocks in the 55 
column pair, to thereby determine a vertical magnifica- 
tion for each pixel block column pair. 

3. The method of claim 2, wherein the change in magnifi- 
cation determining step is further performed as follows: 

a) for the horizontal direction, by calculating an average of 60 
the horizontal magnifications for the pixel block row 
pairs; and 

b) for the vertical direction, by calculating an average of the 
vertical magnifications for the pixel block column pairs. 

4. The method of claim 3, wherein each of the horizontal 65 
and vertical magnification averages is a weighted average, 
with greater weight being given to horizontal and vertical 
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magnifications resulting from corresponding pixel block row 
and column pairs having greater distances between centers of 
the respective pixel blocks. 

5. The method of claim 1, wherein the shear determining 
step is performed as follows: 

a) for the horizontal direction, by dividing a relative hori- 
zontal translation of each pair of the pixel blocks in each 
column of the smallest pixel block subdivision by a 
respective distance between centers of the pixel blocks 
in the column pair, to thereby determine a horizontal 
shear for each pixel block column pair; and 

b) for the vertical direction, by dividing a relative vertical 
translation of each pair of the pixel blocks in each row of 
the smallest pixel block subdivision by a respective dis- 
tance between centers of the pixel blocks in the row pair, 
to thereby determine a vertical shear for each pixel block 
row pair. 

6 . The method of claim 5, wherein the shear determining 
step is further performed as follows: 

a) for the horizontal direction, by calculating an average of 
the horizontal shears for the pixel block column pairs; 
and 

b) for the vertical direction, by calculating an average of the 
vertical shears for the pixel block row pairs. 

7. The method of claim 6 , wherein each of the horizontal 
and vertical shear averages is a weighted average, with greater 
weight being given to horizontal and vertical shears resulting 
from corresponding pixel block column and row pairs having 
greater distances between centers of the respective pixel 
blocks. 

8 . A method of stabilizing a video image of interest dis- 
played in multiple video fields of a video sequence, the 
method comprising the steps of: 

dividing an area of first video field of the video sequence 
into rows and columns of pixel blocks, the area contain- 
ing the image; 

determining a horizontal and vertical translation of each of 
the pixel blocks from the first video field to a second 
video field; and 

calculating a change in magnification of the image in each 
of horizontal and vertical directions from the first video 
field to the second video field, wherein the horizontal 
chance in magnification and the vertical change in mag- 
nification are used separately to determine correspond- 
ing horizontal and vertical translations of the video 
image. 

9. The method of claim 8 , wherein the calculating step is 
performed by: 

a) for the horizontal direction, by dividing a relative hori- 
zontal translation of each pair of the pixel blocks in each 
row by a respective distance between centers of the pixel 
blocks in the row pair, to thereby determine a horizontal 
magnification for each pixel block row pair; and 

b) for the vertical direction, by dividing a relative vertical 
translation of each pair of the pixel blocks in each col- 
umn by a respective distance between centers of the 
pixel blocks in the column pair, to thereby determine a 
vertical magnification for each pixel block column pair. 

10. The method of claim 9, wherein the change in magni- 
fication determining step is further performed as follows: 

a) for the horizontal direction, by calculating an average of 
the horizontal magnifications for the pixel block row 
pairs; and 

b) for the vertical direction, by calculating an average of the 
vertical magnifications for the pixel block column pairs. 

11. The method of claim it, wherein each of the horizontal 
and vertical magnification averages is a weighted average, 
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with greater weight being given to horizontal and vertical 
magnifications resulting from corresponding pixel block row 
and column pairs having greater distances between centers of 
the respective pixel blocks. 

12 . A method of stabilizing a video image of interest dis- 
played in multiple video fields of a video sequence, the 
method comprising the steps of: 

dividing an area of a first video field of the video sequence 
into rows and columns of pixel blocks, the area contain- 
ing the image; 

determining a horizontal and vertical translation of each of 
the pixel blocks from the first video field to a second 
video field; and 

calculating shear of the image in each of horizontal and 
vertical directions from the first video field to the second 
video field, wherein the shear is due to rotation of an 
object toward or away from a camera. 

13 . The method of claim 12 , wherein the shear determining 
step is performed as follows: 

a) for the horizontal direction, by dividing a relative hori- 
zontal translation of each pair of the pixel blocks in each 
column by a respective distance between centers of the 
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pixel blocks in the column pair, to thereby determine a 
horizontal shear for each pixel block column pair; and 
b) for the vertical direction, by dividing a relative vertical 
translation of each pair of the pixel blocks in each row by 
5 a respective distance between centers of the pixel blocks 

in the row pair, to thereby determine a vertical shear for 
each pixel block row pair. 

14 . The method of claim 13 , wherein the shear determining 
step is further performed as follows: 
to a) for the horizontal direction, by calculating an average of 
the horizontal shears for the pixel block column pairs; 
and 

b) for the vertical direction, by calculating an average of the 
vertical shears for the pixel block row pairs. 

15 15. The method of claim 1 4, wherein each of the horizontal 

and vertical shear averages is a weighted average, with greater 
weight being given to horizontal and vertical shears resulting 
from corresponding pixel block column and row pairs having 
greater distances between centers Of the respective pixel 
20 blocks. 



